Skip to content

Gene.bordegaray/2026/02/dyn filter partition indexed#20246

Draft
gene-bordegaray wants to merge 7 commits intoapache:mainfrom
gene-bordegaray:gene.bordegaray/2026/02/dyn_filter_partition_indexed
Draft

Gene.bordegaray/2026/02/dyn filter partition indexed#20246
gene-bordegaray wants to merge 7 commits intoapache:mainfrom
gene-bordegaray:gene.bordegaray/2026/02/dyn_filter_partition_indexed

Conversation

@gene-bordegaray
Copy link
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Feb 9, 2026
Copy link
Contributor

@NGA-TRAN NGA-TRAN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR makes sense. My main comments:

  1. You may add document about pruning and filtering and why we need partition-index if not yet documented somewhere
  2. I think there is a typo in test data
  3. I am unclear whether recursively searching RepartitionExec is the best strategy and whether it will introduce new bug. Maybe some examples and explanation will help.
  4. I would ask someone that know dynamic filtering well to review this. Adrian and Lia? Maybe they will help explain the recursive walk, too

@github-actions github-actions bot added documentation Improvements or additions to documentation optimizer Optimizer rules common Related to common crate proto Related to proto crate labels Feb 11, 2026
Copy link
Contributor

@LiaCastaneda LiaCastaneda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me and will be very helpful for use cases where we want to avoid repartitioning data. My only concern is that API users would need to align the probe and build side partitions, but this seems like a reasonable tradeoff. Let’s see what other contributors think.

Comment on lines +1002 to +1003
/// CASE-hash routing, but this assumes build/probe partition indices stay aligned for
/// dynamic filter consumers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// CASE-hash routing, but this assumes build/probe partition indices stay aligned for
/// dynamic filter consumers.
/// CASE-hash routing, but this assumes build/probe partition indices stay aligned, otherwise the query might have correctness problems.

I would also add a small example to clarify what “aligned” means -- for example, ranges 0–5 on partition 0 on both the build and probe sides, and so on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Example will epxlain things clearer here. Monodraw is a good tool for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

plan = if let Some(hash_join) = plan.as_any().downcast_ref::<HashJoinExec>()
&& matches!(hash_join.mode, PartitionMode::Partitioned)
{
let routing_mode =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a Partitioned join, is it possible to have one side perserving partitioning and not the other (have RepartitionExec on one side only)? This would be a misuse from the API? if so, should we throw an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Copy link
Contributor

@NGA-TRAN NGA-TRAN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely the right approach. I let Lia and DF committers review the details. Great work, Gene

Comment on lines +1002 to +1003
/// CASE-hash routing, but this assumes build/probe partition indices stay aligned for
/// dynamic filter consumers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Example will epxlain things clearer here. Monodraw is a good tool for this

&& dynamic.has_partitioned_filters()
{
let snapshot = dynamic.current_for_partition(partition)?;
return Ok(Transformed::yes(snapshot));
Copy link
Contributor

@NGA-TRAN NGA-TRAN Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are 2 joins in the plan like this and dynamic filtering is turned on for both of them but they will be on different partitioning expressions. Will this transform stop and get the right expression? Adding some tests for those cases will help verify that and uncover bugs if any

                  Join2
                 /           \            
       repartition     T3
        /              
      Join2      
   /         \   
T1          T2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this test, you can use SQLlogic to test it.

Copy link
Contributor Author

@gene-bordegaray gene-bordegaray Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think a sql logic and a integration test will be good. I want to check the filter itself and the mode that got selected in this case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate documentation Improvements or additions to documentation optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants